Parser-Based Retraining for Domain Adaptation of Probabilistic Generators

نویسندگان

Deirdre Hogan

Jennifer Foster

Joachim Wagner

Josef van Genabith

چکیده

While the effect of domain variation on Penntreebank-trained probabilistic parsers has been investigated in previous work, we study its effect on a Penn-Treebank-trained probabilistic generator. We show that applying the generator to data from the British National Corpus results in a performance drop (from a BLEU score of 0.66 on the standard WSJ test set to a BLEU score of 0.54 on our BNC test set). We develop a generator retraining method where the domain-specific training data is automatically produced using state-of-the-art parser output. The retraining method recovers a substantial portion of the performance drop, resulting in a generator which achieves a BLEU score of 0.61 on our BNC test data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parser Adaptation to the Biomedical Domain without Re-Training

We present a distributional approach to the problem of inducing parameters for unseen words in probabilistic parsers. Our KNN-based algorithm uses distributional similarity over an unlabelled corpus to match unseen words to the most similar seen words, and can induce parameters for those unseen words without retraining the parser. We apply this to domain adaptation for three different parsers t...

متن کامل

Studying impressive parameters on the performance of Persian probabilistic context free grammar parser

In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...

متن کامل

Adapting a Lexicalized-Grammar Parser to Contrasting Domains

Most state-of-the-art wide-coverage parsers are trained on newspaper text and suffer a loss of accuracy in other domains, making parser adaptation a pressing issue. In this paper we demonstrate that a CCG parser can be adapted to two new domains, biomedical text and questions for a QA system, by using manually-annotated training data at the POS and lexical category levels only. This approach ac...

متن کامل

A New Framework for Domain Adaptation without Model Retraining

We propose a principled and effective domain adaptation framework that pursues the goal of Open Domain NLP (train once, test anywhere). Most domain adaptation frameworks adapt the models trained on the source domain data by retraining it on target domains (with a mix of labeled and unlabeled data). However, it is time consuming to retrain big models or pipeline systems, and may not even be feas...

متن کامل

Dependency Parsing and Domain Adaptation with LR Models and Parser Ensembles

We present a data-driven variant of the LR algorithm for dependency parsing, and extend it with a best-first search for probabilistic generalized LR dependency parsing. Parser actions are determined by a classifier, based on features that represent the current state of the parser. We apply this parsing framework to both tracks of the CoNLL 2007 shared task, in each case taking advantage of mult...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2008

Parser-Based Retraining for Domain Adaptation of Probabilistic Generators

نویسندگان

چکیده

منابع مشابه

Parser Adaptation to the Biomedical Domain without Re-Training

Studying impressive parameters on the performance of Persian probabilistic context free grammar parser

Adapting a Lexicalized-Grammar Parser to Contrasting Domains

A New Framework for Domain Adaptation without Model Retraining

Dependency Parsing and Domain Adaptation with LR Models and Parser Ensembles

عنوان ژورنال:

اشتراک گذاری